Popularity and average top 10% percentile count over time¶
We can see that the share of reviews for the top 5% and especially top 1% of podcasts started increasing significantly after 2018. This implies that the investments were somewhat worthwhile.
Some questions we need to consider: If Spotify's investment in podcasts (both specific and overall infrastructure) resulted in significant user growth, did most of these users:
- Disproportionately listen to these most expensive/most popular podcasts?
- If so, did these new users later engage with other podcasts as well?
- Were the new users retained over a longer period, or was there a significant drop-off?
Did most of the growth went to the top 1% of podcasts¶
Hypothesis I¶
Did Spotify's investment and overall strategy of focusing on a small number of creators prove effective? Specifically, did the growth rate in popularity of the most popular podcasts (defined as the top 1st percentile based on the number of reviews) exceed that of other podcasts? Based on this question, we formulate our first hypothesis:
H1: The number of reviews for the most popular podcasts is increasing at a faster rate than for the bottom 99% of all podcasts.
To test this hypothesis, follow these steps:
- Transform the
reviews_by_month_count_df_after_2015dataframe to show the monthly growth rate for the top 1% and bottom 99% of podcasts.
| count | |
|---|---|
| 0 | 1981420 |
Growth rate for top 1%:
Mean: 0.08 (Std Dev: 0.44)
Growth rate for bottom 99%:
Mean: 0.02 (Std Dev: 0.14)
To decide the appropriate test, the following should be considered:
- Data should be normally distributed or the sample size should be large.
- Variances of the two groups being compared should be equal.
If these assumptions do not hold, a non-parametric test like the Mann-Whitney U Test should be used.
Shapiro-Wilk Test for Normality:
Test Stat: 0.76
P-value: 0.0 (If p-value < 0.05, data is not normally distributed).
This indicates that a non-parametric test should be used.
Levene Test for Homogeneity of Variances:
Test Stat: 40.53
P-value: 0.0
This also supports the decision to use a non-parametric test.
Mann-Whitney U Test:
U-value: 40.53
P-value: 0.0
The p-value indicates a significant difference in growth rates between the top 1% and bottom 99%. This supports the initial hypothesis.
Comparison of Average Growth Rates:
Average growth for top 1%: 0.08
Average growth for the bottom 99%: 0.02
Distribution of Podcasts by Popularity:¶
Gini coefficient is: 0.9302907076746617
We can further see that the distribution of reviews between podcasts is extremely unevenly distributed. Specifically, the top 1% of all podcasts by review count have 57% of all reviews.
User Engagement Analysis¶
What are the user listening patterns?
Distribution of review count by user: mean: 1.34 median: 1.00 stdev: 1.83 skewness: 98.11 *is extremely high and indicates a very strong rightward skewness. This suggests that most of the data values are clustered around the left, with a few extremely large values on the right.
kurtosis**: 21137.37
*direction and degree of asymmetry. A positive skew indicates that the tail is on the right side of the distribution. ** high kurtosis means more of the variance is the result of infrequent extreme deviations.
Podcast Genre Analysis¶
| category | podcast_id | rating | content_length | title_length | review_count | |
|---|---|---|---|---|---|---|
| 0 | true-crime | bf5bf76d5b6ffbf9a31bba4480383b7f | 4.353402 | 265.356595 | 12.0 | 31010 |
| 1 | true-crime | bc5ddad3898e0973eb541577d1df8004 | 3.686242 | 305.848232 | 61.0 | 9587 |
| 2 | comedy | bc5ddad3898e0973eb541577d1df8004 | 3.686242 | 305.848232 | 61.0 | 9587 |
| 3 | news | f5fce0325ac6a4bf5e191d6608b95797 | 3.839367 | 219.777151 | 20.0 | 7265 |
| 4 | true-crime | b1a3eb2aa8e82ecbe9c91ed9a963c362 | 4.247432 | 249.893554 | 19.0 | 6717 |
| ... | ... | ... | ... | ... | ... | ... |
| 212367 | leisure-animation-manga | f0e247111c8985e0c5e14cc8d6442f09 | 5.000000 | 173.000000 | 22.0 | 1 |
| 212368 | leisure | d0bc8c5bf6f0f1eeda8d5c1c8b38adc9 | 5.000000 | 119.000000 | 17.0 | 1 |
| 212369 | leisure-animation-manga | f1bf522813566465708ba99c92813c84 | 5.000000 | 481.000000 | 17.0 | 1 |
| 212370 | judaism | f564e91cdf68e9c51a40fc38b73da7b6 | 5.000000 | 228.000000 | 24.0 | 1 |
| 212371 | history | c3f0fe1ab04701f43cc02fa0316d23cf | 5.000000 | 27.000000 | 17.0 | 1 |
212372 rows × 6 columns
Total Unique Podcasts 110024
| category | review_count | unique_podcasts | |
|---|---|---|---|
| 79 | society-culture | 329054 | 13731 |
| 16 | comedy | 306950 | 11864 |
| 103 | true-crime | 154221 | 1264 |
| 20 | education | 145413 | 8827 |
| 68 | religion-spirituality | 141541 | 12095 |
| 104 | tv-film | 133763 | 6469 |
| 8 | business | 116883 | 8072 |
| 86 | sports | 113116 | 7266 |
| 59 | news | 103378 | 4297 |
| 30 | health-fitness | 96948 | 6050 |
| 15 | christianity | 84668 | 7954 |
| 0 | arts | 84494 | 6078 |
| 41 | kids-family | 66247 | 2383 |
| 38 | history | 57816 | 1663 |
| 46 | leisure | 54142 | 4178 |
| top_level_category | review_count | unique_podcasts | |
|---|---|---|---|
| 19 | society | 437998 | 19441 |
| 4 | comedy | 333317 | 12803 |
| 2 | business | 223394 | 12931 |
| 5 | education | 217351 | 13005 |
| 8 | health | 184358 | 8731 |
| 21 | sports | 184103 | 9280 |
| 16 | news | 179549 | 6606 |
| 24 | tv | 168752 | 8285 |
| 23 | true-crime | 154221 | 1264 |
| 17 | religion | 143055 | 12246 |
| 0 | arts | 141137 | 9675 |
| 14 | leisure | 99622 | 7011 |
| 13 | kids | 88307 | 3321 |
| 3 | christianity | 84668 | 7954 |
| 15 | music | 60633 | 6161 |
Index(['category', 'podcast_id', 'rating', 'content_length', 'title_length',
'review_count', 'top_level_category'],
dtype='object')
array(['society', 'comedy', 'business', 'education', 'health', 'sports',
'news', 'tv', 'true-crime', 'religion', 'arts', 'leisure', 'kids',
'christianity', 'music'], dtype=object)
array(['society', 'comedy', 'business', 'education', 'health', 'sports',
'news', 'tv', 'true-crime', 'religion', 'arts', 'leisure', 'kids',
'christianity', 'music'], dtype=object)
array(['true-crime', 'comedy', 'news', 'society', 'kids', 'education',
'religion', 'sports', 'tv', 'health', 'business', 'music', 'arts',
'christianity', 'leisure'], dtype=object)
Hypothesis II¶
We can see that the proportion of reviews belong to the Top 1% of podcasts varies wildly between genre. Based on this we can check for which categories the proportion of reviews which belong to the top 1% increased the most.
| category | podcast_id | year_month | review_count | top_level_category | |
|---|---|---|---|---|---|
| 0 | business | a00018b54eb342567c94dacfb2a3e504 | 2017-10-31 | 1 | business |
| 1 | christianity | a00043d34e734b09246d17dc5d56f63c | 2019-09-30 | 1 | christianity |
| 2 | religion-spirituality | a00043d34e734b09246d17dc5d56f63c | 2019-09-30 | 1 | religion |
| 3 | religion-spirituality | a0004b1ef445af9dc84dad1e7821b1e3 | 2011-08-31 | 1 | religion |
| 4 | spirituality | a0004b1ef445af9dc84dad1e7821b1e3 | 2011-08-31 | 1 | spirituality |
| ... | ... | ... | ... | ... | ... |
| 1247729 | news | ffff32caeedd6254573ad1cc49852595 | 2018-02-28 | 1 | news |
| 1247745 | arts | ffff5db4b5db2d860c49749e5de8a36d | 2011-05-31 | 1 | arts |
| 1247759 | comedy | ffff66f98c1adfc8d0d6c41bb8facfd0 | 2018-09-30 | 4 | comedy |
| 1247761 | education | ffff923482740bc21a0fe184865ec2e2 | 2018-04-30 | 1 | education |
| 1247763 | comedy | ffffbd44ec5f79d502f16ae372bf2d4f | 2021-08-31 | 1 | comedy |
151349 rows × 5 columns
Index(['category', 'podcast_id', 'year_month', 'review_count',
'top_level_category'],
dtype='object')
| top_level_category | post_cutoff | is_top_1_percent | review_count | total | prop_of_all_reviews | |
|---|---|---|---|---|---|---|
| 1 | arts | False | True | 3574 | 16557 | 0.215860 |
| 3 | arts | True | True | 2285 | 9112 | 0.250768 |
| 5 | buddhism | False | True | 47 | 184 | 0.255435 |
| 8 | business | False | True | 9377 | 33327 | 0.281363 |
| 10 | business | True | True | 4451 | 19714 | 0.225779 |
| 12 | christianity | False | True | 1757 | 10361 | 0.169578 |
| 14 | christianity | True | True | 2201 | 7094 | 0.310262 |
| 16 | comedy | False | True | 8684 | 30434 | 0.285339 |
| 18 | comedy | True | True | 5584 | 15611 | 0.357696 |
| 20 | education | False | True | 6855 | 24218 | 0.283054 |
| 22 | education | True | True | 3860 | 19103 | 0.202063 |
| 24 | fiction | False | True | 841 | 2451 | 0.343125 |
| 26 | fiction | True | True | 1084 | 3467 | 0.312662 |
| 28 | government | False | True | 594 | 1936 | 0.306818 |
| 30 | government | True | True | 127 | 857 | 0.148191 |
| 32 | health | False | True | 4417 | 17304 | 0.255259 |
| 34 | health | True | True | 2611 | 14157 | 0.184432 |
| 36 | hinduism | False | True | 12 | 34 | 0.352941 |
| 39 | history | False | True | 1541 | 4142 | 0.372042 |
| 41 | history | True | True | 393 | 2545 | 0.154420 |
| 43 | islam | False | True | 23 | 257 | 0.089494 |
| 45 | islam | True | True | 33 | 124 | 0.266129 |
| 47 | judaism | False | True | 40 | 246 | 0.162602 |
| 49 | judaism | True | True | 28 | 258 | 0.108527 |
| 51 | kids | False | True | 1808 | 7357 | 0.245752 |
| 53 | kids | True | True | 1572 | 5565 | 0.282480 |
| 55 | leisure | False | True | 3168 | 10201 | 0.310558 |
| 57 | leisure | True | True | 1052 | 6776 | 0.155254 |
| 59 | music | False | True | 2106 | 8978 | 0.234573 |
| 61 | music | True | True | 1551 | 4620 | 0.335714 |
| 63 | news | False | True | 3627 | 11611 | 0.312376 |
| 65 | news | True | True | 3560 | 10340 | 0.344294 |
| 67 | religion | False | True | 3255 | 17109 | 0.190251 |
| 69 | religion | True | True | 2993 | 10715 | 0.279328 |
| 71 | science | False | True | 1011 | 3372 | 0.299822 |
| 73 | science | True | True | 294 | 2073 | 0.141823 |
| 75 | society | False | True | 13055 | 41444 | 0.315003 |
| 77 | society | True | True | 6256 | 26072 | 0.239951 |
| 79 | spirituality | False | True | 1083 | 4115 | 0.263183 |
| 81 | spirituality | True | True | 688 | 2726 | 0.252384 |
| 83 | sports | False | True | 5028 | 16407 | 0.306455 |
| 85 | sports | True | True | 2759 | 11894 | 0.231966 |
| 87 | technology | False | True | 1646 | 6719 | 0.244977 |
| 89 | technology | True | True | 445 | 1915 | 0.232376 |
| 91 | true-crime | False | True | 2325 | 5044 | 0.460944 |
| 93 | true-crime | True | True | 911 | 5503 | 0.165546 |
| 95 | tv | False | True | 4485 | 16644 | 0.269466 |
| 97 | tv | True | True | 2837 | 8515 | 0.333177 |
| top_level_category | pre_cutoff_ratio | post_cutoff_ratio | pre_cutoff_review_count | post_cutoff_review_count | relative_change_in_ratio | sum_review_count | |
|---|---|---|---|---|---|---|---|
| 0 | arts | 0.215860 | 0.250768 | 3574.0 | 2285.0 | 0.161715 | 5859.0 |
| 2 | business | 0.281363 | 0.225779 | 9377.0 | 4451.0 | -0.197555 | 13828.0 |
| 3 | christianity | 0.169578 | 0.310262 | 1757.0 | 2201.0 | 0.829611 | 3958.0 |
| 4 | comedy | 0.285339 | 0.357696 | 8684.0 | 5584.0 | 0.253585 | 14268.0 |
| 5 | education | 0.283054 | 0.202063 | 6855.0 | 3860.0 | -0.286134 | 10715.0 |
| 6 | fiction | 0.343125 | 0.312662 | 841.0 | 1084.0 | -0.088781 | 1925.0 |
| 7 | government | 0.306818 | 0.148191 | 594.0 | 127.0 | -0.517006 | 721.0 |
| 8 | health | 0.255259 | 0.184432 | 4417.0 | 2611.0 | -0.277472 | 7028.0 |
| 10 | history | 0.372042 | 0.154420 | 1541.0 | 393.0 | -0.584939 | 1934.0 |
| 11 | islam | 0.089494 | 0.266129 | 23.0 | 33.0 | 1.973703 | 56.0 |
| 12 | judaism | 0.162602 | 0.108527 | 40.0 | 28.0 | -0.332558 | 68.0 |
| 13 | kids | 0.245752 | 0.282480 | 1808.0 | 1572.0 | 0.149449 | 3380.0 |
| 14 | leisure | 0.310558 | 0.155254 | 3168.0 | 1052.0 | -0.500081 | 4220.0 |
| 15 | music | 0.234573 | 0.335714 | 2106.0 | 1551.0 | 0.431169 | 3657.0 |
| 16 | news | 0.312376 | 0.344294 | 3627.0 | 3560.0 | 0.102177 | 7187.0 |
| 17 | religion | 0.190251 | 0.279328 | 3255.0 | 2993.0 | 0.468210 | 6248.0 |
| 18 | science | 0.299822 | 0.141823 | 1011.0 | 294.0 | -0.526975 | 1305.0 |
| 19 | society | 0.315003 | 0.239951 | 13055.0 | 6256.0 | -0.238259 | 19311.0 |
| 20 | spirituality | 0.263183 | 0.252384 | 1083.0 | 688.0 | -0.041032 | 1771.0 |
| 21 | sports | 0.306455 | 0.231966 | 5028.0 | 2759.0 | -0.243067 | 7787.0 |
| 22 | technology | 0.244977 | 0.232376 | 1646.0 | 445.0 | -0.051437 | 2091.0 |
| 23 | true-crime | 0.460944 | 0.165546 | 2325.0 | 911.0 | -0.640854 | 3236.0 |
| 24 | tv | 0.269466 | 0.333177 | 4485.0 | 2837.0 | 0.236431 | 7322.0 |
| top_level_category | year_month_first | prop_top_1_percent_first | year_month_last | prop_top_1_percent_last | prop_change | |
|---|---|---|---|---|---|---|
| 15 | music | 2005-11-30 | 0.0 | 2022-12-31 | 0.891667 | 0.891667 |
| 19 | society | 2005-11-30 | 0.0 | 2022-12-31 | 0.805195 | 0.805195 |
| 8 | health | 2005-11-30 | 0.0 | 2022-12-31 | 0.365854 | 0.365854 |
| 0 | arts | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 13 | kids | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 23 | true-crime | 2015-06-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 22 | technology | 2005-11-30 | 0.0 | 2022-10-31 | 0.000000 | 0.000000 |
| 21 | sports | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 20 | spirituality | 2005-11-30 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 18 | science | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 17 | religion | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 16 | news | 2005-11-30 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 14 | leisure | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 12 | judaism | 2005-12-31 | 0.0 | 2022-08-31 | 0.000000 | 0.000000 |
| 1 | buddhism | 2005-11-30 | 0.0 | 2022-05-31 | 0.000000 | 0.000000 |
| 11 | islam | 2005-12-31 | 0.0 | 2022-08-31 | 0.000000 | 0.000000 |
| 10 | history | 2005-11-30 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 9 | hinduism | 2006-11-30 | 0.0 | 2021-09-30 | 0.000000 | 0.000000 |
| 7 | government | 2005-11-30 | 0.0 | 2022-10-31 | 0.000000 | 0.000000 |
| 6 | fiction | 2005-12-31 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 5 | education | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 4 | comedy | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 3 | christianity | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 2 | business | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 24 | tv | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
'runs'
| column_name | data_type | |
|---|---|---|
| 0 | run_at | text |
| 1 | max_rowid | integer |
| 2 | reviews_added | integer |
'podcasts'
| column_name | data_type | |
|---|---|---|
| 0 | podcast_id | text |
| 1 | itunes_id | integer |
| 2 | slug | text |
| 3 | itunes_url | text |
| 4 | title | text |
'categories'
| column_name | data_type | |
|---|---|---|
| 0 | podcast_id | text |
| 1 | category | text |
'reviews'
| column_name | data_type | |
|---|---|---|
| 0 | author_id | text |
| 1 | podcast_id | text |
| 2 | created_at | text |
| 3 | title | text |
| 4 | content | text |
| 5 | rating | integer |
| 6 | created_at_dt | timestamp with time zone |